Panduan Pemrograman CUDA: Dasar-Dasar Pengembangan Kernel CUDA

Pengembangan kernel CUDA dimulai dengan definisi dari sebuah Kernel, yaitu fungsi C++ khusus yang dirancang untuk dieksekusi secara paralel melintasi jumlah inti yang sangat besar dari sebuah GPU NVIDIA. Fungsi-fungsi ini mewakili satuan kerja dasar dalam model pemrograman CUDA, berfungsi sebagai jembatan di mana logika host serial berpindah ke eksekusi perangkat secara paralel masif.

1. Penentu global

Penentu __global__ penentu deklarasi adalah kualifikasi API wajib yang memberi tahu kompiler untuk menghasilkan kode untuk GPU sambil tetap membuat titik masuk fungsi terlihat bagi CPU. Fungsi-fungsi yang dieksekusi di GPU dan dapat dipanggil dari host disebut kernel.

2. Lingkungan Eksekusi

Kernel dikirimkan ke dan dieksekusi pada Multiprosesor Streaming (SMs). SM adalah mesin komputasi utama di dalam GPU NVIDIA yang bertanggung jawab atas pengelolaan ratusan thread bersamaan. Setiap SM menangani blok-blok thread dan menjadwalkannya ke inti pemroses.

Aturan Sintaks: Kernel harus secara ketat mengembalikan void. Karena mereka beroperasi secara asinkron dari host, mereka tidak dapat mengembalikan nilai langsung ke CPU; mereka harus menulis hasil kembali ke memori perangkat yang dialokasikan.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary function of the __global__ specifier?

It defines a function that runs on the CPU but is callable from the GPU.

It defines a kernel that runs on the GPU and is callable from the CPU.

It allocates memory on the GPU's SM cache.

It synchronizes all threads in a block.

✅ Correct!

Correct! __global__ is the bridge used to launch kernels from Host code.

❌ Incorrect

Incorrect. __global__ specifically identifies entry-point kernels for GPU execution called by the Host.

QUESTION 2

Why must CUDA kernels return void?

Because they execute asynchronously and have no direct path to return values to the Host thread.

To save registers on the SM.

Because GPU memory is read-only.

The NVCC compiler does not support float returns.

QUESTION 3

Which hardware component is responsible for managing and executing threads in a CUDA kernel?

The PCIe Controller.

The Streaming Multiprocessor (SM).

The Host RAM controller.

The BIOS.

QUESTION 4

What happens when a Host calls a kernel function?

The CPU halts until the GPU finish processing.

The GPU creates a clone of the function for every available SM.

The kernel is enqueued for execution on the GPU, and the CPU continues to the next instruction.

The CPU performs a context switch to the GPU.

QUESTION 5

Which of the following is the correct definition of a CUDA kernel?

A function that executes on the GPU and is invoked from the Host.

A C++ library for file I/O.

A hardware driver for NVIDIA GPUs.

A standard CPU function with the __gpu__ prefix.

1. Penentu __global__

2. Lingkungan Eksekusi

1. Penentu global